{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Part 8 - Transfer Learning.ipynb",
      "version": "0.3.2",
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "metadata": {
        "id": "Z_S_khL4oH5h",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Transfer Learning\n",
        "\n",
        "In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). \n",
        "\n",
        "ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).\n",
        "\n",
        "Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.\n",
        "\n",
        "With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now."
      ]
    },
    {
      "metadata": {
        "id": "7WYJEvpMoa04",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 121
        },
        "outputId": "41b1f213-3291-43c9-a6bc-d22e5ea28063"
      },
      "cell_type": "code",
      "source": [
        "!pip install torch torchvision\n",
        "!pip install Pillow"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: torch in /usr/local/lib/python3.6/dist-packages (0.4.1)\n",
            "Requirement already satisfied: torchvision in /usr/local/lib/python3.6/dist-packages (0.2.1)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from torchvision) (1.14.6)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from torchvision) (1.11.0)\n",
            "Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.6/dist-packages (from torchvision) (5.3.0)\n",
            "Requirement already satisfied: Pillow in /usr/local/lib/python3.6/dist-packages (5.3.0)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "PghCX6nRoH5n",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "#%matplotlib inline\n",
        "%config InlineBackend.figure_format = 'retina'\n",
        "\n",
        "import matplotlib.pyplot as plt\n",
        "import os\n",
        "import torch\n",
        "from torch import nn\n",
        "from torch import optim\n",
        "import torch.nn.functional as F\n",
        "from torchvision import datasets, transforms, models\n",
        "from collections import OrderedDict"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "JyXdlwvupX20",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "outputId": "252ec60f-991a-4470-9a96-bd97e8a613b6"
      },
      "cell_type": "code",
      "source": [
        "if not os.path.exists('./Cat_Dog_data.zip'):\n",
        "    !wget https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip\n",
        "    !unzip Cat_Dog_data.zip\n",
        "    print ('Downloaded and unzipped data successfully!')\n",
        "else:\n",
        "    print ('Requirement already satisfied!')"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Requirement already satisfied!\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "r17CyFMHoH5z",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`."
      ]
    },
    {
      "metadata": {
        "id": "CW27urlLoH51",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "data_dir = 'Cat_Dog_data'\n",
        "\n",
        "# TODO: Define transforms for the training data and testing data\n",
        "train_transforms = transforms.Compose([transforms.RandomRotation(30),\n",
        "                                       transforms.RandomResizedCrop(224),\n",
        "                                       transforms.RandomHorizontalFlip(),\n",
        "                                       transforms.ToTensor(),\n",
        "                                       transforms.Normalize([0.485, 0.456, 0.406],\n",
        "                                                            [0.229, 0.224, 0.225])])\n",
        "\n",
        "test_transforms = transforms.Compose([transforms.Resize(255),\n",
        "                                      transforms.CenterCrop(224),\n",
        "                                      transforms.ToTensor(),\n",
        "                                      transforms.Normalize([0.485, 0.456, 0.406],\n",
        "                                                           [0.229, 0.224, 0.225])])\n",
        "\n",
        "# Pass transforms in here, then run the next cell to see how the transforms look\n",
        "train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)\n",
        "test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)\n",
        "\n",
        "trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)\n",
        "testloader = torch.utils.data.DataLoader(test_data, batch_size=64)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "yP0T3xwRoH58",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on."
      ]
    },
    {
      "metadata": {
        "scrolled": true,
        "id": "otpbv8pioH6B",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 8755
        },
        "outputId": "2aae5410-1fa8-464a-881d-aa16de78fd17"
      },
      "cell_type": "code",
      "source": [
        "model = models.densenet121(pretrained=True)\n",
        "model"
      ],
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.6/dist-packages/torchvision/models/densenet.py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_.\n",
            "  nn.init.kaiming_normal(m.weight.data)\n"
          ],
          "name": "stderr"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "DenseNet(\n",
              "  (features): Sequential(\n",
              "    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n",
              "    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "    (relu0): ReLU(inplace)\n",
              "    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n",
              "    (denseblock1): _DenseBlock(\n",
              "      (denselayer1): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer2): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer3): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer4): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer5): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer6): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "    )\n",
              "    (transition1): _Transition(\n",
              "      (norm): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "      (relu): ReLU(inplace)\n",
              "      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
              "    )\n",
              "    (denseblock2): _DenseBlock(\n",
              "      (denselayer1): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer2): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer3): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer4): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer5): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer6): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer7): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer8): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer9): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer10): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer11): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer12): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "    )\n",
              "    (transition2): _Transition(\n",
              "      (norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "      (relu): ReLU(inplace)\n",
              "      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
              "    )\n",
              "    (denseblock3): _DenseBlock(\n",
              "      (denselayer1): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer2): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer3): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer4): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer5): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer6): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer7): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer8): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer9): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer10): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer11): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer12): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer13): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer14): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer15): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer16): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer17): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer18): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer19): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer20): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer21): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer22): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer23): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer24): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "    )\n",
              "    (transition3): _Transition(\n",
              "      (norm): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "      (relu): ReLU(inplace)\n",
              "      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
              "    )\n",
              "    (denseblock4): _DenseBlock(\n",
              "      (denselayer1): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer2): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer3): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer4): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer5): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer6): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer7): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer8): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer9): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer10): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer11): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer12): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer13): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer14): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer15): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "      (denselayer16): _DenseLayer(\n",
              "        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu1): ReLU(inplace)\n",
              "        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
              "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "        (relu2): ReLU(inplace)\n",
              "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
              "      )\n",
              "    )\n",
              "    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
              "  )\n",
              "  (classifier): Linear(in_features=1024, out_features=1000, bias=True)\n",
              ")"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 5
        }
      ]
    },
    {
      "metadata": {
        "id": "4OqfVTMCoH6Q",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers."
      ]
    },
    {
      "metadata": {
        "id": "QusAVG9moH6S",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# Freeze parameters so we don't backprop through them\n",
        "for param in model.parameters():\n",
        "    param.requires_grad = False\n",
        "\n",
        "from collections import OrderedDict\n",
        "classifier = nn.Sequential(OrderedDict([\n",
        "                          ('fc1', nn.Linear(1024, 500)),\n",
        "                          ('relu', nn.ReLU()),\n",
        "                          ('fc2', nn.Linear(500, 2)),\n",
        "                          ('output', nn.LogSoftmax(dim=1))\n",
        "                          ]))\n",
        "    \n",
        "model.classifier = classifier"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "1sUTxgkeoH6Y",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.\n",
        "\n",
        "PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU."
      ]
    },
    {
      "metadata": {
        "id": "FjWTE7GYoH6b",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import time"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "LK4qDSSJoH6j",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 52
        },
        "outputId": "d32ae736-fc8e-4c90-d6bc-e90169947a44"
      },
      "cell_type": "code",
      "source": [
        "for device in ['cpu', 'cuda']:\n",
        "\n",
        "    criterion = nn.NLLLoss()\n",
        "    # Only train the classifier parameters, feature parameters are frozen\n",
        "    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)\n",
        "\n",
        "    model.to(device)\n",
        "\n",
        "    for ii, (inputs, labels) in enumerate(trainloader):\n",
        "\n",
        "        # Move input and label tensors to the GPU\n",
        "        inputs, labels = inputs.to(device), labels.to(device)\n",
        "\n",
        "        start = time.time()\n",
        "\n",
        "        outputs = model.forward(inputs)\n",
        "        loss = criterion(outputs, labels)\n",
        "        loss.backward()\n",
        "        optimizer.step()\n",
        "\n",
        "        if ii==3:\n",
        "            break\n",
        "        \n",
        "    print(f\"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds\")"
      ],
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Device = cpu; Time per batch: 7.543 seconds\n",
            "Device = cuda; Time per batch: 0.018 seconds\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "wtLgru7ooH6v",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "You can write device agnostic code which will automatically use CUDA if it's enabled like so:\n",
        "```python\n",
        "# at beginning of the script\n",
        "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
        "\n",
        "...\n",
        "\n",
        "# then whenever you get a new Tensor or Module\n",
        "# this won't copy if they are already on the desired device\n",
        "input = data.to(device)\n",
        "model = MyModule(...).to(device)\n",
        "```\n",
        "\n",
        "From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.\n",
        "\n",
        ">**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen."
      ]
    },
    {
      "metadata": {
        "id": "xdOKYaT9oH6w",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 104
        },
        "outputId": "6998980b-7658-4a17-aec1-7f31d75bdd79"
      },
      "cell_type": "code",
      "source": [
        "## TODO: Use a pretrained model to classify the cat and dog images\n",
        "criterion = nn.NLLLoss()\n",
        "optimizer = optim.Adam(model.classifier.parameters(), lr=0.005)\n",
        "\n",
        "epochs = 5\n",
        "\n",
        "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
        "model = model.to(device)\n",
        "\n",
        "train_losses = []\n",
        "test_losses = []\n",
        "\n",
        "for e in range(epochs):\n",
        "    running_loss = 0\n",
        "    for images, labels in trainloader:\n",
        "        images, labels = images.to(device), labels.to(device)\n",
        "        optimizer.zero_grad()\n",
        "        \n",
        "        log_ps = model(images)\n",
        "        loss = criterion(log_ps, labels)\n",
        "        loss.backward()\n",
        "        \n",
        "        optimizer.step()\n",
        "        running_loss += loss.item()\n",
        "        \n",
        "    else:\n",
        "        with torch.no_grad():\n",
        "            model.eval()\n",
        "            accuracy = 0\n",
        "            test_loss = 0\n",
        "            for images, labels in testloader:\n",
        "                images, labels = images.to(device), labels.to(device)\n",
        "                log_ps = model(images)\n",
        "                loss = criterion(log_ps, labels)\n",
        "                test_loss += loss.item()\n",
        "                ps = torch.exp(log_ps)\n",
        "                \n",
        "                top_k, top_class = ps.topk(1, dim=1)\n",
        "                equals = top_class == labels.view(*top_class.shape)\n",
        "                accuracy += torch.mean(equals.type(torch.FloatTensor))\n",
        "                \n",
        "        train_losses.append(running_loss / len(trainloader))\n",
        "        test_losses.append(test_loss / len(testloader))\n",
        "\n",
        "        print ('Epochs {}/{}'.format(e+1, epochs),\n",
        "               'Train Loss: {:.3f}'.format(train_losses[-1]),\n",
        "               'Test Loss: {:.3f}'.format(test_losses[-1]),\n",
        "               'Accuracy {:.3f}'.format((accuracy / len(testloader)) * 100 ))\n",
        "        \n",
        "        model.train()"
      ],
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Epochs 1/5 Train Loss: 0.155 Test Loss: 0.052 Accuracy 98.086\n",
            "Epochs 2/5 Train Loss: 0.163 Test Loss: 0.041 Accuracy 98.477\n",
            "Epochs 3/5 Train Loss: 0.153 Test Loss: 0.041 Accuracy 98.398\n",
            "Epochs 4/5 Train Loss: 0.144 Test Loss: 0.042 Accuracy 98.320\n",
            "Epochs 5/5 Train Loss: 0.136 Test Loss: 0.038 Accuracy 98.359\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}